Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | i | 26 | säger |
2 | och | 27 | Men |
3 | att | 28 | men |
4 | på | 29 | kan |
5 | som | 30 | så |
6 | en | 31 | ska |
7 | är | 32 | man |
8 | det | 33 | I |
9 | för | 34 | mot |
10 | med | 35 | efter |
11 | har | 36 | vi |
12 | till | 37 | när |
13 | av | 38 | hon |
14 | inte | 39 | sin |
15 | den | 40 | Jag |
16 | om | 41 | år |
17 | ett | 42 | nu |
18 | han | 43 | in |
19 | de | 44 | ut |
20 | Det | 45 | kommer |
21 | var | 46 | under |
22 | sig | 47 | ha |
23 | från | 48 | hade |
24 | – | 49 | Han |
25 | jag | 50 | här |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges